Avoiding model selection bias in small-sample genomic datasets
نویسندگان
چکیده
منابع مشابه
Avoiding model selection bias in small-sample genomic datasets
MOTIVATION Genomic datasets generated by high-throughput technologies are typically characterized by a moderate number of samples and a large number of measurements per sample. As a consequence, classification models are commonly compared based on resampling techniques. This investigation discusses the conceptual difficulties involved in comparative classification studies. Conclusions derived f...
متن کاملSelection bias in the LETOR datasets
The LETOR datasets consist of data extracted from traditional IR test corpora. For each of a number of test topics, a set of documents has been extracted, in the form of features of each document-query pair, for use by a ranker. An examination of the ways in which documents were selected for each topic shows that the selection has (for each of the three corpora) a particular bias or skewness. T...
متن کاملSmall sample bias and selection bias effects in multivariate calibration, exemplified for OLS and PLS regressions
In multivariate calibration by for example ordinary least squares (OLS) multiple regression or by partial least squares regression (PLSR) the predictor ŷ(x) is perfect for the calibration sample itself, in the sense that the regression of observed y on predicted ŷ(x) is y = ŷ(x). Plots of y against ŷ(x) are much used to illustrate how good the calibration is and how well prediction works. Usual...
متن کاملSample Selection Bias Correction Theory
This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effe...
متن کاملModels for Sample Selection Bias
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2006
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btl066